Cognitive policy learner: biasing winning or losing strategies

نویسندگان

Dominik Dahlem

Jim Dowling

William Harrison

چکیده

In continuous learning settings stochastic stable policies are often necessary to ensure that agents continuously adapt to dynamic environments. The choice of the decentralised learning system and the employed policy plays an important role in the optimisation task. For example, a policy that exhibits fluctuations may also introduce non-linear effects which other agents in the environment may not be able to cope with and even amplify these effects. In dynamic and unpredictable multiagent environments these oscillations may introduce instabilities. In this paper, we take inspiration from the limbic system to introduce an extension to the weighted policy learner, where agents evaluate rewards as either positive or negative feedback, depending on how they deviate from average expected rewards. Agents have positive and negative biases, where a bias either magnifies or depresses a positive or negative feedback signal. To contain the non-linear effects of biased rewards, we incorporate a decaying memory of past positive and negative feedback signals to provide a smoother gradient update on the probability simplex, spreading out the effect of the feedback signal over time. By splitting the feedback signal, more leverage on the win or learn fast (WoLF) principle is possible. The cognitive policy learner is evaluated using a small queueing network and compared with the fair action and weighted policy learner. Emphasis is placed on analysing the dynamics of the learning algorithms with respect to the stability of the queueing network and the overall queueing performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Winning, Losing and Drawing in Concurrent Games with Perfect or Imperfect Information

Nondeterministic concurrent strategies—those strategies compatible with copy-cat behaving as identity w.r.t. composition—have been characterised as certain maps of event structures. This leads to a bicategory of general concurrent games in which the maps are nondeterministic concurrent strategies. This paper explores the consequences of extending concurrent games with (1) winning, losing and, i...

متن کامل

Gambling warning messages: The impact of winning and losing on message reception across a gambling session.

Gambling warning messages have been shown to lead to prevention and modification of risk-taking behaviors. Laboratory studies have shown messages can increase a player's knowledge about gambling specific risks, modify their gambling-related cognitive distortions, and even change play. In the present laboratory study, participants were randomly assigned to a winning or losing slot machine gambli...

متن کامل

A Survey of the Performance Profile of Winning and Losing Male Karatekas in the World Leagues 2017 and 2018

Introduction & Objective: Technical and tactical performance analysis of athletes is always considered to improve their performance and other athletes. The purpose of this study was to evaluate the technical and tactical performance profile of elite karate players in different weight groups. Tools and Methods: In this descriptive study, the functional profile of the winning and losing karate pl...

متن کامل

Analysis of Translation Strategies Employed in Awards-winning Subtitled Dramas

The increasing impact of audiovisual media and film industry in particular has led researchers to think of audiovisual translation strategies. Huge investments in film industry need global markets. Hence, there is a need for qualified translations and systematic studies dedicated to this area are in great demand. This study aims to investigate translation strategies adopted in the translation o...

متن کامل

Par-rondo's Paradoxical Games and the Discrete Brownian Ratchet

We introduce Parrondo's paradox that involves games of chance. We consider two fair games, A and B, both of which can be made to lose by changing a biasing parameter. The apparently paradoxical situation arises when the two games are played in any alternating order. A winning expectation is produced, even though both games A and B are losing when we play them individually. We develop an explana...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Cognitive policy learner: biasing winning or losing strategies

نویسندگان

چکیده

منابع مشابه

Winning, Losing and Drawing in Concurrent Games with Perfect or Imperfect Information

Gambling warning messages: The impact of winning and losing on message reception across a gambling session.

A Survey of the Performance Profile of Winning and Losing Male Karatekas in the World Leagues 2017 and 2018

Analysis of Translation Strategies Employed in Awards-winning Subtitled Dramas

Par-rondo's Paradoxical Games and the Discrete Brownian Ratchet

عنوان ژورنال:

اشتراک گذاری